Bump dflash-mlx from 0.1.0 to 0.1.7 by dependabot[bot] · Pull Request #79 · youssofal/MTPLX

dependabot · 2026-05-23T10:33:02Z

Bumps dflash-mlx from 0.1.0 to 0.1.7.

Release notes

dflash-mlx v0.1.7

Big adaptive-runtime update focused mostly on Qwen3.6 27B 4-bit and real long-context usage.

The main change is the adaptive verify policy.

DFlash normally drafts a block and asks the target to verify it. Large verify blocks, like M=16, are great when acceptance is strong because one target pass can commit many tokens. But when the draft gets weaker, large blocks waste target work on suffix tokens that will be rejected anyway.

v0.1.7 makes that decision dynamic:

Start from the normal large-block path.

Watch recent acceptance, tokens per cycle, and real cycle wall cost.

If the large block stops paying off, drop to a smaller M=4 verify block.

Stay there for a burst while acceptance stabilizes.

Periodically probe back up to the large block.

Resume M=16 only when the probe is actually better than the reduced path.

So the goal is not just “higher acceptance”. The goal is committed tokens per real unit of time. M=4 can be better when acceptance is low; M=16 can be better when the draft is stable. The runtime now measures that instead of forcing one block size everywhere.

Highlights:

retuned adaptive verify for long-context / agentic decode

richer metrics: tokens/cycle, adaptive block state, per-mode/per-block speed, CopySpec counters

/metrics now exposes real decode average tok/s plus logical / real / restored prefill rates

AIME25 benchmark suite with exact integer scoring

Qwen thinking default now follows tokenizer/request behavior instead of forcing thinking off

GDN recurrent exactness fixes around state dtype in gated-delta tape/tree kernels

public README benchmark artifacts for Qwen3.6 27B 4-bit at 1k / 2k / 4k / 8k / 16k

Measured README prompt, Qwen3.6 27B 4-bit, M5 Max, stock mlx_lm baseline, repeat=3, cooldown=120s, no EOS:

1024: baseline 33.26 tok/s, DFlash 98.05 tok/s, 2.95x

2048: baseline 32.34 tok/s, DFlash 90.67 tok/s, 2.81x

4096: baseline 30.58 tok/s, DFlash 93.55 tok/s, 3.06x

8192: baseline 26.03 tok/s, DFlash 79.12 tok/s, 3.04x

16384: baseline 21.50 tok/s, DFlash 60.77 tok/s, 2.78x

This release is mostly about making DFlash more usable and observable in real runs, especially long-context coding/agentic workloads.

dflash-mlx v0.1.6

Large runtime, server, and agentic-workflow release since v0.1.5, including the v0.1.5.1 fixes.

Highlights

Reworked runtime ownership around typed runtime config, RuntimeBundle, ServerRuntime, target adapters, draft loading, cache management, and observability.

Default verify policy is now adaptive; fixed DFlash verification is available as --verify-mode dflash.

Added explicit verify modes: adaptive, dflash, ddtree, and off.

Added DDTree branch verification mode for Qwen target paths.

Added internal CopySpec candidate reuse for repeated-token continuation from prompt/generated history.

Added target-owned Qwen and Gemma4 backend routing, with unknown model families failing closed instead of falling into generic logic.

... (truncated)

Commits

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [dflash-mlx](https://github.com/bstnxbt/dflash-mlx) from 0.1.0 to 0.1.7. - [Release notes](https://github.com/bstnxbt/dflash-mlx/releases) - [Commits](https://github.com/bstnxbt/dflash-mlx/commits/v0.1.7) --- updated-dependencies: - dependency-name: dflash-mlx dependency-version: 0.1.7 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 23, 2026

dependabot Bot requested a review from youssofal as a code owner May 23, 2026 10:33

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 23, 2026

dependabot Bot mentioned this pull request May 23, 2026

Bump dflash-mlx from 0.1.0 to 0.1.6 #69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump dflash-mlx from 0.1.0 to 0.1.7#79

Bump dflash-mlx from 0.1.0 to 0.1.7#79
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/dflash-mlx-0.1.7

dependabot Bot commented on behalf of github May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github May 23, 2026

dflash-mlx v0.1.6

Highlights

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants